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Abstract. Biclustering involves the simultaneous clustering of objects 
and their attributes, thus defining local two-way clustering models. Re¬ 
cently, efficient algorithms were conceived to enumerate all biclusters in 
real-valued datasets. In this case, the solution composes a complete set 
of maximal and non-redundant biclusters. However, the ability to enu¬ 
merate biclusters revealed a challenging scenario: in noisy datasets, each 
true bicluster may become highly fragmented and with a high degree of 
overlapping. It prevents a direct analysis of the obtained results. Aim¬ 
ing at reverting the fragmentation, we propose here two approaches for 
properly aggregating the whole set of enumerated biclusters: one based 
on single linkage and the other directly exploring the rate of overlapping. 
Both proposals were compared with each other and with the actual state- 
of-the-art in several experiments, and they not only significantly reduced 
the number of biclusters but also consistently increased the quality of the 
solution. 

Keywords: Biclustering, bicluster enumeration, bicluster aggregation, 
outlier removal, metrics for biclusters. 


1 Introduction 

Biclustering techniques aim to simultaneously cluster objects and attributes of a 
dataset. Each bicluster is represented as a tuple containing a subset of the rows, 
and a subset of the columns, as long as they exhibit some kind of coherence 
pattern. There are several kinds of coherence which can be found in a biclus¬ 
ter, and they directly interfere on the mechanism of bicluster identification. As 
finding all biclusters in a dataset is an NP-hard problem, several heuristics were 
proposed, such as CC [1] and FLOG [2]. Such heuristics may miss important 
biclusters, and may also return non-maximal biclusters (biclusters that can be 
further augmented). 

In the case of binary datasets, there are a plenty of algorithms for enumer¬ 
ating all maximal biclusters. Some examples are Makino & Uno [3], LCM [3] 
and In-Close2 [5]. The enumeration of all maximal biclusters in an integer or 

* 



2 


On bicluster aggregation and its benefits for enumerative solutions 


real-valued dataset is a much more challenging scenario, but we already have 
some proposals, such as RIn-Close [6] and RAP [7j. 

The drawback of enumerative algorithms, particularly in the context of noisy 
datasets, is the existence of a large number of biclusters, due to fragmentation 
of a much smaller number of true biclusters. This is exemplified in one of our 
experiments, where we take artificial datasets, gradually increment the variance 
of a Gaussian noise, and get the enumerative result. As shown in Fig. with 
enough noise, the enumerative results exhibit an strong increase on the quantity 
of biclusters. This fragmentation leads to a challenging scenario for the analysis 
of the results, which can become impractical even in small datasets. In fact, 
the noise is responsible for fragmenting each true bicluster into many with high 
overlapping, so that the aggregation of these biclusters is recommended i m- 

We propose a way of aggregating biclusters from a biclustering result that 
shows a high overlapping among its components, as it is the case when enumer¬ 
ating biclusters in noisy datasets. For this reason, in this paper we will focus 
on enumerative results, but our proposal can be applied to the result of any 
algorithm that returns biclusters with high overlapping among them. The for¬ 
mulation is based on the fact that the high overlapping among biclusters may 
indicate that they are fragments of a true bicluster that should be reconstructed. 
We propose two different techniques to perform the aggregation, followed by a 
step that removes elements that should not be part of a bicluster. We performed 
experiments with three artificial datasets posing different challenges, and two 
real datasets from distinct backgrounds. We compared our proposals with a bi¬ 
cluster ensemble algorithm, and the merging/deleting steps of MicroCluster [5] . 
The experimental results show that the aggregation not only severely reduces 
the quantity of biclusters, but also tends to increase the quality of the solution. 

The paper is organized as follows. In Section]^ we give the main definitions 
and discuss the related works in the literature. Sectionoutlines our proposals. 
The metrics used to evaluate our proposals will be presented in Section In 
Section we present the experimental procedure and the obtained results of 
the experiments. Concluding remarks and future work are outlined in Section 


2 Definitions and Related Work 

Consider a dataset A G with rows X = {xi,X 2 , ■ ■ ■ ,Xn} and columns Y = 

{j/i, j/ 2 , • • •, 2/m}- We define a bicluster B = {B'~, B^), where B^ C X and B‘^ C F, 
such that the elements in the bicluster show a coherence pattern. A bicluster 
solution is a set of biclusters represented hy B = containing q biclusters. 

A bicluster is maximal if and only if we can not include any other object / 
attribute without violating the coherence threshold. If a solution contains non- 
maximal biclusters, the result is redundant because there will be biclusters which 
are part of larger ones. 

Madeira & Oliveira uni categorized the types of biclusters according to their 
similarity patterns. They also categorized the biclusters structure in a dataset 
based on their disposition and level of overlapping. We highlight that biclusters 
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with constant values, constant values on rows, or constant values on columns are 
special cases of biclusters with coherent values, and we will focus our attention 
on the latter, due to its generality. For a comprehensive survey of biclustering 
algorithms, the reader may refer to m and m- 

The overlapping between two biclusters B and C is an important concept in 
this work, and is defined as: 


ov{B, C) 


\B^r\C^ X 

min{\B^ x \C'^ x €<=[)' 


( 1 ) 


Now we shall proceed to the aggregation proposals in the literature. It is im¬ 
portant to highlight that, when aggregating two maximal biclusters, the coher¬ 
ence threshold will be violated. Otherwise, the biclusters would not be maximal. 


2.1 MicroCluster Aggregation 

MicroCluster [3] is an enumerative proposal that has two additional steps after 
the enumeration. These steps have the task of deleting or merging biclusters 
which are not covering an area much different from other biclusters. The first 
is the deleting step. If we find a bicluster such that the ratio of its area that 
is not covered by any other bicluster, by its total area, is less than a threshold 
77, it can be removed. The second step is the merging one. Let us consider two 
biclusters and generate a third one with the union of rows and columns of the 
previous two. If the ratio of the area of the third bicluster that is not covered 
by any of the previous two, by its total area, is less than a threshold 7 , we can 
aggregate the two biclusters into this third one. In this method of aggregation, 
non-maximal biclusters will be removed in the deleting step, thus not interfering 
in the final result. For more details, please refer to Zhao & Zaki [ 3 . 

2.2 Aggregation Using Triclustering 

Triclustering was proposed by Haczar & Nadif |13j as a biclustering ensemble 
algorithm. First, they transform each bicluster into a binary matrix. After that, 
they propose a triclustering algorithm to find the k most relevant biclusters. As 
they were able to improve the biological relevance of biclustering for microarray 
data |14j . we will use this method as a contender in this paper. One major point 
in ensemble is that we want to combine the results reinforcing the biclusters 
that seem to be important for several components, and discarding the ones that 
may come from noise. Due to the way the triclustering algorithm handles the 
optimization step, non-maximal biclusters can interfere in the final results. 

Bicluster aggregation is slightly different from bicluster ensemble. While on 
ensemble tasks we discard biclusters that seem unimportant and combine the 
ones that contribute the most for the solution, in bicluster aggregation we never 
discard any bicluster. Given this characteristic, the bicluster ensemble solution 
is expected to show a high Precision with an impacted Recall (see Section]^, 
as it eliminates biclusters. 
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2.3 Other Aggregation Methods 

Gao & Akoglu [12] used the principle of Minimum Description Length to pro¬ 
pose CoClusLSH, an algorithm that returns a hierarchical set of biclusters. The 
hierarchical part can be seen as an aggregation step. This step is done based on 
the LSH technique as a hash function. Candidates hashed to the same bucket 
are then aggregated until no merging improves the final solution. Their work 
is focused in finding biclusters in a checkerboard structure, that does not allow 
overlapping, thus being not suitable for the kind of problem we are dealing with. 

Liu et al. |5] proposed OPC-Tree, a deterministic algorithm to mine Order 
Preserving Clusters (OP-Clusters), a general case of Order Preserving Sub Ma¬ 
trices (OPSM) type of biclusters. They also have an additional step for creating 
a hierarchical aggregation of the OP-Clusters. The Kendall coefficient is used to 
determine which clusters should be merged and in which order the objects should 
participate in the resultant OP-Cluster. The highest the Rank Correlation using 
the Kendall coefficient, the highest the similarity between two OP-Clusters. The 
merging is allowed according to a threshold that is reduced in a level-wise way. 
OPC-Tree considers the order of the rows in the bicluster. In this work, we are 
dealing with biclusters of coherent values. In this case, a perfect coherent values 
bicluster keeps the order of its rows and the hierarchical step of OPC-Tree would 
be able to be used in this case as well. But we are considering noisy datasets, 
in which this assumption probably will not hold, thus the hierarchical step of 
OPC-Tree is not suitable for the problem we are dealing with. 


3 New Proposals for Aggregation 

3.1 Aggregation with Single Linkage 

Our first proposal receives as input a biclustering solution R, from enumeration 
or from a result presenting high overlapping among its components. With this 
solution, we transform each bicluster into a binary vector representation as fol¬ 
lows: Given the dimensions of the dataset A G each bicluster will be a 

binary vector x of length n + m. For a bicluster B transformed into the binary 
vector X, the first n positions represent the rows of the dataset A and if the 
bicluster contains the ith row, x^ = 1, otherwise x^ = 0. The last m positions 
represent the columns of the dataset A and if the bicluster contains the Rh 
column, x„_|_i = 1, otherwise x„_|_i = 0. After this transformation, we use the 
Hamming distance to apply the single linkage clustering on the existing biclus¬ 
ters. Notice that the Hamming distance on this transformation will just count 
how many rows and columns are different among the two biclusters. In this case, 
a non-maximal bicluster may be distant from the bicluster that covers its maxi¬ 
mal area, thus impacting the quality of the results of this method of aggregation. 
In this case, it is necessary that this proposal receives a biclustering solution B 
containing only maximal biclusters. 
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After choosing a cut on the dendrogram, we aggregate all biclusters that 
belong to a junction using the function aggreg, defined as: 

aggreg{B,C) = iB-UC^,B^UC‘^), ( 2 ) 

that is simply the union of rows / columns of the biclusters. It is important 
to note that the aggreg function is associative, since it is based on the union 
operation. Moreover, we want to highlight that the direct union of rows / columns 
may include elements that should not be part of a bicluster. In Section |3.3| we 
will present a way to remove rows / columns that may be interpreted as outliers. 

3.2 Aggregation by Overlapping 

It seems intuitive to aggregate the biclusters with an overlapping rate above 
a defined threshold. This proposal is based on the aggregation by pairs: while 
having two biclusters with an overlapping rate higher than a pre-determined 
threshold th, we remove them from the set of biclusters, and include the re¬ 
sult of the function aggreg, defined on Eq. taking these two biclusters as the 
arguments. 

Let B, C, D, and E be biclusters. Note that for D = aggreg{B, C), ov{D, E) > 
ov{B,E) and ov{D,E) > ov{C,E). So, for all biclusters E where ov{B,E) > th 
or ov{C, E) > th, we have ov{D, E) > th. For this reason, the order of the 
aggregation does not interfere on the final result. It is also important to note 
that the new bicluster D can have ov{D,E) > th, for some bicluster E where 
ov{B,E) < th and ov{C,E) < th. In this aggregation proposal, maximal biclus¬ 
ters will properly merge with non-maximal biclusters. 

3.3 Outlier Removal 

After aggregating the results, we need to process each final bicluster to look for 
objects and / or attributes that may be interpreted as outliers. In this work, 
this step will always be executed after the aggregation using any of our two 
proposals. 

Let B = {B^,B^) be an aggregated bicluster, with |i3’’| = o, \B‘^\ = p. We 
define a participation matrix P G where each element Py indicates the 

quantity of biclusters in which this element takes part in B. For example, if an 
element is part of 15 biclusters that compose B, then its value on the P matrix 
will be 15. 

So, we will explain the process of outlier removal with the help of Figurej^ We 
have two steps of outlier removal: one for the objects, the other for the attributes. 
To remove possibly outlier objects, we take the mean and the standard deviation 
of all columns on the participation matrix P. The left side of Figure [^illustrates 
this step. After that, we check the values of each element of the columns. If the 
value is less than the mean minus one standard deviation, then we check this 
element as a potential outlier. In Figure we can see that the entire first row 
was checked as potential outlier because 1 < 7.75 — 4. If we mark the entire row 
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(a) Calculating the mean 
and standard deviation of 
each column. 


(b)AII elements marked as 
potential outliers 


Fig. 1: Example of outlier removal. 


as a potential outlier, it is removed from the bicluster. In our example, that is 
the case. 

We execute the same process for the columns, calculating the mean, standard 
deviation and checking for potential outliers on the rows. We remove the column 
if it is entirely marked as a potential outlier. 

4 Metrics for Biclustering 

In this paper we will use only external metrics, except for the Gene Ontology 
Enrichment Analysis (GOEA). External metrics compare a given solution with 
a reference one. For an extensive comparison of external metrics for biclustering 
solutions, the reader may refer to [T5] . 

The Gene Ontology Project[^(GO) is an initiative to develop a computational 
representation of the knowledge of how genes encode biological functions at the 
molecular, cellular and tissue system levels. The GOEA compares a set of genes 
with known information. For example, given a set of genes that are up-regulated 
under certain conditions, an enrichment analysis will find which GO terms are 
over-represented (or under-represented) using annotations for that gene se10 
This method is commonly used to analyze results from biclustering techniques 
on microarray gene expression datasets. 

Precision, Recall and F-score are often used on information retrieval for 
measuring binary classification m- If we take pairs of elements, we can extend 
these metrics to evaluate clustering / biclustering solutions with overlapping. The 
pairwise definition of Precision and Recall can be found in m- It is important to 
highlight that these metrics do not consider the quantity of biclusters. Pairwise 
Precision, or just Precision for simplicity, is the fraction of retrieved pairs that 
are relevant; while Pairwise Recall, or just Recall for simplicity, is the fraction of 
relevant pairs that are retrieved. The F-score is the harmonic mean of Precision 
and Recall. 

Clustering Error {CE) is an external metric that considers the quantity of 
biclusters in its evaluation. This metric severely penalizes a solution with more 
biclusters than the reference, thus not being recommended for evaluating enu¬ 
merative results. The definition and more details can be found in m- 

^ http://geneontology.org 

^ http://geneontology.org/page/about Acessed on 2015, January, 16 
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We propose the difference in coverage, that measures what the reference bi¬ 
clustering solution covers and the found biclustering solution does not cover, and 
vice versa. Although very similar, when compared with the pairwise definitions 
of Precision and Recall, this metric gives a more intuitive idea of how two solu¬ 
tions cover distinct areas of the dataset. It also can be computed much faster. 
Let Ug =[JBI X Bf be the usual union set of a biclustering solution B. Let B 
and C be the found and the reference biclustering solutions, respectively. Then 
the difference in coverage is given by: 


dif-COv{B, C) 


I Ug ~ Uc I + I Up — Ug I 

m X n 


(3) 


We will use this measure to verify how different an aggregated solution is 
from the enumerative one. 


5 Experiments 

In our experiments, we employed three artificial datasets: artl, art2, and art3; 
and two real datasets: GDS2587 and FOOD. We designed the artificial datasets 
to present different scenarios with increasing difhculty. They have 1000 objects 
and 15 attributes. Each entry is a random integer, drawn from a discrete uniform 
distribution on the set {1, 2, ..., 100}. Then we inserted: 5 bicluster arbitrarily 
positioned and without overlapping on artl ; 5 bicluster arbitrarily positioned 
and with a similar degree of overlapping on art2', and 15 bicluster arbitrarily 
positioned and with different degrees of overlapping on artS. 

For each bicluster, the quantity of objects was randomly drawn from the set 
{50,..., 60}, and the quantity of attributes was randomly drawn from the set 
{4,5,6, 7}. To insert a bicluster, we fixed the value of the first attribute and 
obtained the values of the other attributes by adding a constant value to the 
first column. This characterizes biclusters of coherent values. This constant value 
was randomly drawn from the set { — 10, —9,..., —1,1,..., 9,10}. 

GDS258i^ is a microarray gene expression dataset, with 2792 genes and 
7 samples, collected from the organism E. coli. We removed every gene with 
missing data in any sample, and the data was normalized by mean centralization, 
as usual in gene expression data analysis m- In this dataset we aim to validate 
our contribution when devoted to microarray gene expression data analysis, as 
it is considered a relevant application of biclustering methods. 

FOOlJ^is a dataset with 961 objects, which represent different foods, and 
7 attributes, which represent nutritional information. As the values of each at¬ 
tribute are in different ranges, we used the same pre-processing as Veroneze et al. 
[5] . In this dataset our goal is to illustrate the usefulness of bicluster aggregation 
in a different scenario and to verify if the aggregation leaves uncovered areas 
that the enumeration has covered at first. 

^ http://www.ncbi.nlm.nih.gov/sites/GDSbrowser?acc=GDS2587 
http://www.ntwrks.com/chartla.htm 
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5.1 Experiments on Artificial Datasets 


Our goal is to verify the impact of noise in the enumeration of biclusters, and 
how the aggregation can improve the quality of the final results. To this end, we 
will add a Gaussian noise with fj, = 0 and a S {0, 0.01,..., 1}, to each dataset, 
and then run the RIn-Close algorithm. This procedure will be repeated for 30 
times and all reported values will be the average of this 30 executions. We will 
set RIn-Close to mine coherent values biclusters, with at least 50 rows and 4 
columns. Also, we will use crescent values for e due to the importance of the 
parameter. If e is too small, we may miss important biclusters expressing more 
internal variance. If e is too high, the biclusters may include unexpected objects 
or attributes. 

As we know the biclusters, we will use Precision, Recall and F-score to assess 
the quality of the results after the enumeration. After that, we will perform the 
aggregation on the results with the value of e that led to an initial Precision 
closest to 0.85. This value was chosen because if the Precision is too low, it 
means that the e value is allowing too many undesired objects or attributes in 
the enumerated biclusters. In this case, the aggregation may not improve the 
quality of the final results because their input is not of good quality. If the 
Precision is too high, we will only be able to see improvements in the reduced 
quantity of biclusters, but the aggregation may increase the Precision too. 

We will consider the following algorithms as contenders: 


Triclustering |13j . We set k to the true number of biclusters. The authors sup¬ 
plied the code for this algorithm. 

Merging and Deleting steps of MicroCluster [3]. To parameterize this algorithm, 
we ran a grid search with the values in the set 0.15, 0.1,0.05, getting 9 results 
for each run. Also, as the aggregation step of the algorithm is composed of 
two steps, merging and deleting, we ran each experiment twice: with the 
merging step first (MD) and with the deleting step first (DM). Unless we 
want to draw attention to some particular fact, we will report only the best 
result. The authors supplied the code for this algorithm]^ 

Single Linkage (see Section |3.1| ) . We cut the dendrogram with the proper quan¬ 
tity of biclusters: for artl and art2, 5 biclusters; for artS, 15 biclusters. 

Aggregation by Overlapping (see Section 3.21. We tested several values for the 
rate of overlapping. 


After getting the results for all executions of the listed algorithms, we will choose 
the best result from each one and compare them using the CE metric. 

Figurej^shows the quantity of enumerated biclusters on the artificial datasets, 
for several values of e. In all datasets, for every value of e, the behavior is the 
same: as the noise increases the quantity of enumerated biclusters starts to in¬ 
crease. In Figures and we know that the real quantity of biclusters is 5, 
but when the noise increases, the enumerated quantity reaches approximately 
800 biclusters, depending on the value of e. In Figure we can see that the 

® http://www.cs.rpi.edu/~zaki/www-new/pmwiki.php/Software/Software 




On bicluster aggregation and its benefits for enumerative solutions 


9 



Fig. 2: Quantity of enumerated biclusters by the variance of the Gaussian noise 
in the artificial datasets. Each curve is parameterized by e. 


quantity of biclusters reaches high values too. At some level of noise, the number 
of biclusters starts to decrease to a point that the algorithm is not able to find 
any bicluster. 



(a) artl Precision (b) art2 Precision 


(c) arts Precision 




(d) artl Recall (e) art2 Recall (f) arts Recall 


Fig. 3: Precision and Recall for the solutions of RIn-Close, with several values 
of e, by the variance of the Gaussian noise in the artihcial datasets. 


In Figure we can see the quality of the enumeration without considering 
the quantity of biclusters. 

As we can see in Figure [3d] the noise has almost no interference in the recall 
for artl. It means that this dataset has biclusters very well defined, that even 
with some noise they are not missed. On the other hand, when the variance of 
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the noise is too low, Figure shows that the found biclusters contains more 
elements than expected. It is happening because the parameter e is high, allowing 
some elements to be part of the biclusters even without being part of the original 
solution. As the noise increases, less of these intruder elements are going to satisfy 
the e restriction to be thus included in some bicluster. In this dataset, the effect 
of the noise were not so severe on the quality, given that the recall started to 
decrease only when the variance of the noise was close to 1. 

In dataset art2 the effect of noise can be better observed. Figure shows 
that the noise starts to affect the solutions very early. When e = 3, the recall 
starts to decrease very soon, with a « 0.5. However, for more relaxed values of 
e we can still see the decrease on the recall. Being the most difficult, dataset 
arts is the most affected by noise. Independently of the value of e, the RIn-Close 
was not able to find any biclusters after some levels of variance in the noise. For 
example, when e = 2, after a « 0.4 the Precision gets undefined. This happens 
because the metric is not defined when the quantity of biclusters is zero. In 
Figure [3fl we can see that the decline of the recall starts when a « 0.3 for e = 2. 

Now we will discuss the results of the aggregation with the previously listed 
algorithms. As stated earlier, we will use the results from a value of e that led to 
an initial Precision close to 0.85. In this case, we have e = 6,4, 3 for artl, art2 
and arts, respectively. 


e=6 



(a) Single Linkage (b) By Overlapping 



Fig. 4: Solutions of aggregation as a function of the variance of the noise in 
dataset artl. The scale on the right refers to quantity. 


Figure |4a] shows the quality of the aggregation with single linkage for dataset 
artl. We can see that, with the proper number of biclusters, the aggregation 
was able to get an almost perfect result. The same thing happened with the 
aggregation by overlapping, reported in Figurej^ Figurej^shows the CE metric 
for all solutions of aggregation. We can see that our proposals were capable of 
producing the best performance on this dataset. 

Figure shows the quality of the aggregation with single linkage for the 
dataset art2. This time, the solution was close to the maximum achievable per¬ 
formance, but not so close as it was in artl. Figure shows the quality of 
the aggregation by overlapping for the same dataset. The quality of this solu¬ 
tion is very similar to the one obtained with single linkage. Figure shows the 
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(a) Single Linkage 



(b) By Overlapping 



Fig. 5: Solutions of aggregation as a function of the variance of the noise in 
dataset art2. The scale on the right refers to quantity. 


CE metrics obtained by all the methods of aggregation. Again, our proposals 
outperformed the other two algorithms. 



Fig. 6: Solutions of aggregation as a function of the variance of the noise in 
dataset artS. The scale on the right refers to quantity. 


Figures and show the quality of aggregation with single linkage and 
by overlapping, respectively. We can see that this dataset is more challenging 
than the previous ones. However, the aggregation was able to significantly re¬ 
duce the quantity of biclusters, while keeping a good quality. Figure shows 
the CE metric for all aggregation methods. Initially MicroCluster had a better 
performance, but our proposals were more robust to noise, getting a better result 
when a ^ 0.4. 

The aggregation was not only able to reduce the quantity of biclusters of the 
enumeration, but also improve the quality of the final result. Now we are going 
to verify the behavior of the aggregation in real datasets. 

5.2 Experiments on Real Datasets 

We will start with the GDS2587 dataset by running RIn-Close to enumerate its 
coherent values biclusters. We set minRow = 50,minCol = 4. When e < 2.8 no 
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biclusters were found, and when e = 3.0 the quantity of biclusters was already 
huge. We found 23, 2.825 and 19.649 biclusters when e = 2.8,2.9, and 3.0, 
respectively. 



Fig. 7: Dendrograms of the aggregation with single linkage on GDS2587 dataset. 


Proceeding to the aggregation, Figurej^shows the dendrograms of the aggre¬ 
gation with single linkage. In this case, the cuts are straightforward, having 2, 4, 
and 5 clusters respectively. The aggregation by overlapping with a rate of 75% 
reached the same quantity of biclusters. We used these quantities to parameter¬ 
ize the triclustering algorithm. The results of the aggregation with MicroCluster 
were very similar, and they depended only on the 7 parameter. We got 7, 8 
and 11 biclusters when 7 = 0.15,0.1, and 0.05, respectively. We will now com¬ 
pare the results with the gene ontology enrichment analysis. A bicluster is called 
’enriched’ when any ontology term gets a p-value less than 0 . 01 . 

When e = 2.8, except for triclustering (only the first bicluster was enriched), 
all the algorithms returned only enriched biclusters. In fact, the four main en¬ 
riched terms were always the same, sometimes on different orders but with very 
close p-values. 


Table 1: Enrichment analysis of one bicluster from the aggregation by overlap¬ 
ping with rate of 70%, on GDS2587 dataset. 


GO Term 

p-val 

counts 

definition 

G0:0044464 

0.00000000 

39 / 774 

Any constituent part of a cell, the basic structural 
and functional unit of all organisms... 

G0:0044444 

0.00000011 

19 / 608 

Any constituent part of the cytoplasm, all of the 
contents of a cell excluding the plasma membrane... 

G0:0044424 

0.00000350 

19 / 578 

Any constituent part of the living contents of a cell; 
the matter contained within (but not including) the 
plasma membrane... 
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When e = 2.9, all algorithms returned only enriched biclusters, including 
triclustering. When e = 3, all algorithms except for triclustering returned only 
enriched biclusters. Triclustering returned 4 from 5 enriched biclusters. 

Table shows the main enriched terms of one bicluster from the aggregation 
by overlapping after outlier removal, when e = 2.8. In this case, the expert should 
choose which solution fits better the goal of the data analysis. 

We will now proceed to the analysis of the FOOD dataset. We are going to 
verify how the aggregation changes the coverage of the dataset when compared 
to the enumeration. As the aggregation will severely reduce the quantity of 
final biclusters, it is important to see if it will leave uncovered areas that were 
previously covered. 

We replicated the experiment from Veroneze et al. [ 6 ] on this dataset and we 
will use e = 1.25 as recommended on that work. With minRow = 48, minCol = 
2 and looking for coherent values biclusters, the quantity of enumerated biclus¬ 
ters for e = 1.25 is 8.676. 



Fig. 8 : Dendrogram for the aggregation with single linkage when e = 1.25 on 
FOOD dataset. 


Figure shows the dendrogram of the aggregation with single linkage. We 
can see that the cuts between 2 and 7 are acceptable. In fact, cutting in two 
groups seems the best option, but it may be considered a small quantity of 
biclusters. As from 4 to 5 the height is more pronounced, for the comparison 
it seems acceptable to cut the dendrogram on 4 objects. The aggregation by 
overlapping with a rate of 70% was also able to recover 4 aggregated biclusters. 

MicroCluster with the deleting operation first was not able to properly ag¬ 
gregate the biclusters, keeping more than 800 biclusters when r] = 0.15. This 
behavior is the opposite of what happened with the artificial datasets. There, 
when the deleting operation came first the results were more effective. Here when 
the merging operation came first, the aggregation was able to reach 13 to 27 bi¬ 
clusters, depending on the 7 parameter. As on the artificial datasets the best 
parameters were 77 = 7 = 0.15, for the comparison we will use this parameter¬ 
ization with the merging operation occurring first, that gives us 13 biclusters. 
For the triclustering algorithm we set k = A, using insider information from the 
aggregation by overlapping. Table shows the comparison of difference in cover¬ 
age (see Eq. between the aggregated solutions with the enumerated solution 
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Table 2: Difference in coverage of the solutions with the enumeration on FOOD 
dataset. 



Single Linkage 

MicroCluster 

Triclustering 

RIn-Close 

By Ov. 

12.50% 

35.50% 

70.31% 

9.1% 

Single Linkage 

- 

46.60% 

81.51% 

20.17% 

MicroCluster 

- 

- 

45.73% 

27.38% 

Triclustering 

- 

- 

- 

61.33% 


from RIn-Close. We can see that the triclustering algorithm produces the most 
distinct solution when compared with the enumerated solution obtained with 
RIn-Close, exhibiting « 61.33% of difference in coverage. The solutions from the 
aggregation by overlapping and with single linkage are relatively close to each 
other, as on the artificial datasets, showing a difference in coverage of « 12.50%. 
At the end, the closest solution to the RIn-Close results was the aggregation by 
overlapping, with a difference in coverage of 9.1%. If we consider that this solu¬ 
tion reduced the quantity of biclusters from 8.676 to 4 biclusters, the difference 
in coverage of only 9.1% seems very promising. 

6 Considering Remarks and Futnre Work 

We have compared the performance of our proposals against the most similar 
proposal in the literature, using artificial and real datasets. The artificial datasets 
were characterized by a controlled structure of biclusters and were useful to show 
that the aggregation can severely reduce the quantity of biclusters, while increas¬ 
ing the quality of the final solution. Our proposals outperformed the compared 
algorithms on the first two artificial datasets, and showed to be more robust to 
noise on the third artificial dataset. 

We also verified if the aggregation could get enriched biclusters in the case of 
a gene expression dataset. For different values of e on the RIn-Close algorithm, 
we could see that the different methods of aggregation reached very similar 
results. The main challenge of the aggregation with single linkage is to decide 
where to cut the dendrogram, but as we could see, this task was straightforward 
on the tested datasets. Except for the triclustering, all aggregations returned 
only enriched biclusters. And finally, we applied the aggregation methods to the 
FOOD dataset and analyzed how the aggregation changed the coverage area 
when compared to the enumeration without aggregation. Triclustering led to 
the most distinct result, and the aggregation by overlapping covered an area 
very similar to the area covered by the enumeration. 

We can conclude that the aggregation is strongly recommended when enu¬ 
merating all biclusters from a dataset. The aggregation will not only significantly 
reduce the quantity of biclusters, but will also reduce the fragmentation and in¬ 
crease the quality of the final result. A post-processing step for outlier removal 
brings additional robustness to the methodology. As a further step of the re¬ 
search, we can adapt our proposals to work on an ensemble configuration. We 
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can also extend this work to deal with time series biclusters, which require con¬ 
tiguous attributes. 

The authors would like to thank CAPES and CNPq for the financial support. 
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